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ABSTRACT 

In this note we formulate image segmentation as a clustering problem. Feature vectors, extracted 
from a raw image are clustered into subregions, thereby segmenting the image. A fuz 2 y 
generalization of Kohonen learning vector quantization (LVQ) which integrates the Fuzzy c- 
Means (FCM) model with the learning rate and updating strategies of the LVQ is used for this task. 
This network, which segments images in an unsupervised manner, is thus related to the FCM 
optimization problem. Numerical examples on photographic and magnetic resonance images are 
given to illustrate this approach to image segmentation. 

1. INTRODUCTION 

Image segmentation divides an image into regions with uniform and homogeneous attributes such 
as gray tone or texture 1 11. Roughly speaking, conventional segmentation algorithms can be 
divided into two classes: region-based schemes, wherein areas of images with homogeneous 
properties are found, which in turn gives region boundaries [2-4]; and edge-based schemes, where 
local discontinuities are detected first, and then connected to form longer, hopefully complete, 
boundaries [5]. Image segmentation should result in regions that cover semantically distinct 
visual entities and is a crucial step for subsequent recognition or interpretation tasks. 

Several image segmentation methods based on Markov Random Fields (MRFs) have been 
proposed. The basic idea is to model spatial interaction of the image features by a MRF which is a 
probability distribution defined over a discrete random field. Hongo et al. [6] proposed a “multiple 
level multiple resolution MRF" to detect the edges which was an extension of the work of Geman 
and Geman [7], This model incorporates a priori knowledge about global structures in images, but 
can be implemented in a local (and parallel) mode. Three algorithms (simulated annealing, 
iterative conditional modes, and maximization of posterior marginals) are compared in [8]; all use 
MRF models to include prior contextual information. Most of these approaches use an energy 
function to guide image segmentation and numerical schemes for minimization of the energy 
functional. However, the search procedure for a global minimum (optimal solution) is usually time 
consuming. Moreover, edge-based segmentation schemes usually need a linking procedure to 
connect broken edges in order to make image subregions that have closed boundaries. Recently, 
several attempts to apply computational neural network architectures to image segmentation 
have been made. For example, edge detection has been formulated in the context of an energy- 
minimizing model by eliminating weak boundaries and small segments [9]: and also as a fuzzy 
feed-forward computational neural network problem [10]. A neural network system capable of 
detecting potential edges in various orientations that uses simulated and mean field annealing is 
discussed in [11]. 

In this note we propose using a new family of clustering algorithms called Fuzzy Learning vector 
Quantization (FLVQ) for image segmentation. FLVQ is a partial integration of Fuzzy c-Means (FCM) 
and Kohonen clustering networks (LVQs). The block diagram of the process is shown in Fig. 1. 
Unlabeled feature vectors (one for each pixel) are first extracted from an image. Then FLVQ clusters 
these feature vectors to get cluster centers. Each cluster center is regarded as a prototype (or vector 
quantizer) of some subregion of the image. Finally, each pixel feature vector is compared to the 
cluster centers, and is assigned a constant value corresponding to the closest cluster center. Note 
that the number of constant values is the same as the number of clusters. 
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Figure 1. FLVQ Image Segmentation: Overall Architecture. 

The remainder of this paper is organized as follows. In the next section, we briefly review the FCM, 
LVQ and FLVQ algorithms. In Section 3, experimental segmentation results on photographic and 
Magnetic Resonance images are reported. Section 4 contains a discussion, conclusions, and some 
ideas for future research. 

2. KOHONEN CLUSTERING NETWORKS 

Many classical clustering algorithms can be found in the texts of Duda and Hart (12). Hartigan (13). 
and Jain and Dubes [14]. In [15] Lippman suggested that Kohonen's learning vector quantization 
(LVQ) [16] is closely related to the sequential Hard c-Means (HCM) algorithm. Fuzzy c-Means (FCM) 
is a well known generalization of HCM [17,18]. Since HCM/FCM are optimization procedures, 
whereas LVQ is not, integration of FCM and LVQ is one way to address several problems of LVQs 
while simultaneously attacking the general problem of how the two families are related. 
Huntsberger and AJjimarangsee [19] first considered this approach, and their idea was extended in 
[20] to the FLVQ algorithms described below. 

Let c be an integer, 1< c<n, and let X = fz^, x 2 * n ) denote a set of n feature vectors in 9l p . X is 

numerical object data, the j-th object has vector Xj as it's numerical representation, and Xj k is the 
k-th characteristic (or feature) associated with object j. Given X, we say that c fuzzy subsets (uj : X 
[0,1]) are a constrained fuzzy c-partition of X in case the cn values (u Jk = Uj(x k ), l<k<n. l<i<c) 
satisfy three conditions: 


0 < u^ < 1 for all i,k ; 

(la) 

Lj u^ = 1 for all k ; 

(lb) 

0<I 1 Ujjj < n VL 

(10 


Here u^ is interpreted as the membership of x k in the i-th partitioning subset (cluster) of X. If all 
of the u lk 's are In [1,0], U = [u^l is a conventional (crisp, hard) c-partition of X. The most well 

known objective function for clustering in X is the classical within groups sum of squared errors 
function, defined as : 


JjftJ.v ; X) = Lfa u^ I lx k -Vj II 2 . (2) 

where v = (vj, v 2 v c ) is a vector of (unknown) cluster centers (weights, prototypes, or vector 

quantizers), € 9I P for 1 S i < c, and U is a hard or conventional c-partition of X. Optimal 
partitions U* of X are taken from pairs (U*. v*) that are “local minimizers” of Jj. Dunn [18] first 
generalized (2) for m=2, and subsequently, Bezdek [17] generalized (2) to the infinite family written 
as: 


JmlU.v: X) = ZfytoF 1 1 I k -T, I I A 2 . (3 » 

where m e [1, <*>) is a weighting exponent on each fuzzy membership, U is a fuzzy c-partition of X, v = 

(vj. v 2 v c ) are cluster centers in 91 p , A= any positive definite (p x p) matrix, and I lx k -Vjl I A = 

(x k -Vj) A (x k -Vj) is the distance (in the A norm) from x k to Vj. Conditions that are necessary for 


99 





extrema of J j and J m follow : Hard c-Meana mrMl Theorem 1171 (U.v) may minimize 22 ujjJ I I x^- 
vjl 1 A^ 2 on ^ ^ : 


fl; OI*k “ V A ) 2 = min j {(llx k - ) 2 } 

Uik [0; otherwise 

v i = £ u fr x k / Z u ik Kb) 

In the context of Image segmentation, equation (4a) will be used to assign each (pixel) vector to 
its closest prototype vj; this is the essence of our segmentation scheme. Note that the HCM produces 
a partition U that contains hard clusters. The well known generalization of HCM is contained in 
the following: Fuzzy c-Means CFCM) Theorem [171 A ssume 1 1 x^- vj I I A 2 > 0. V j.k at each iteration 

of (5): (U.v) may minimize 22 u^ 11 ^ I I x^- vj I I ^ for m>l only if : 

u* = (5>*. - v,ll A /II*. - Vjll A (5a) 

v i = |5b) 

Conditions (5) -» (4) and J m -» Jj as m -» 1 from above. The FCM (HCM) algorithms are iterative 
procedures for approximately minimizing J m (Jj) by Picard iteration through (5) or (4), 
respectively. C-Means algorithms are non-sequential algorithms: updates on the weights (Vj t ) are 
performed after each pass through X. Thus, iterate sequence {vj t ) is independent of the sequence of 
the data labels. The parameter (m) essentially controls the “amount of fuzziness" in U. As m -» 
1/c: when m-» + l, u^ t 1 or 0. 

Kohonen clustering networks (LVQs) are unsupervised schemes which find the “best" set of 
prototypes (for hard clusters) in an iterative, sequential manner. The structure of LVQ consists of 
two layers: an input (fanout) layer, and an output (competitive) layer as shown in Fig. 2. The edges 
that connect the p Input nodes to the c output nodes do not have “weights” attached to them, as, for 
example. In a feed forward network architecture. Instead, each output node has a prototype (vector 
quantizer) attached to It, and it is this set of network weight vectors that are adjusted during 
learning. A formal description of LVQ is given below. There are other versions of LVQ; this one is 
usually regarded as the "standard" form. 

The LVQ Clustering Algorithm [16] 


LVQ1. Given unlabeled data set X = (ij, ...x n ) c 9? p . Fixe, T, and e > 0. 
LVQ2. Initialize V q = (Vjq y cd e ^ ’ and learning rate Oq e (1,0) . 


LVQ3. Fort =1.2 T; 

For k= 1,2 n: 


a. Find |x fc - ▼ u _,| * jpjn 


V J4 - 1 


b. Update the winner : Vj t = Vj t _ j+ a t^ x k‘ v i t- 1^ 
Next k 

d. Apply the 1-NP (nearest prototype) rule to the data : 


u 




1; 

H 

VI 

1 

■H 

N 

0; 

otherwise “ 


* vj ,l<J<c.j*i\ 


,1<1<c and l<k<n. 


( 6 ) 

(7) 


( 8 ) 
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e. Compute E^V- V ^ = £|v ^ 

f. If < e stop; Else adjust learning rate ot^; 
Next t 



v 


rk.t - 1 




Input 

Data 

Point 


x k= 


Input layer 



Figure 2. The structure of a Kohonen clustering network. 


The numbers U, 


LVQ 


u 


LVO 


* J 


at (8) are a cxn matrix that almost always (constraint (lc) may not be 


satisfied) define a hard c-partition of X using the 1-NP classifier assignment rule at (4). Our 
inclusion of computation of the hard 1-NP c-partition of X at the end of each pass through the data 
(step LVQ3.d) Is not part of the LVQ algorithm - that Is, the LVQ iterate sequence does not depend on 
cycling through U’s. Ordinarily this computation is done once, non-iteratively, outside and after 
termination of LVQ. Note that LVQ uses the Euclidean distance in step LVQ3.a. This choice 


corresponds roughly to the update rule shown In (7) , since V 


x - v 


= -2J(x - ▼) = -2(x - v). The 


origin of this rule assumes that each x e 9I P is distributed according to a probability density 
function /(x) . LVQ's objective Is to find a set of v^'s to minimize the expected value of the square 
of the discretization error : 


E (i x " v jf ) = H ", * I* - v «f / (x)dx l9) 

In this expression v 1 is the winning prototype for each x , and will of course vary as x ranges over 

9t p . A sample function of this optimization problem Is e = |x - v^. An optimal set of v^'s can be 

approximated by applying local gradient descent to a finite set of samples drawn from f. The 
extant theory for this scheme Is contained in [21], which states that LVQ converges in the sense 
that the prototypes V t = (Vj t , Vj t v c t ) generated by the LVQ iterate sequence converge, i.e., 

{V ( } — — > V, provided two conditions are met by the sequence (a ( ) of learning rates used in (7) : 
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oo 


( 10 ) 


o° oo 2 

X ct t = 00 and la < 

t=o f t=o 4 

One choice for the learning rates that satisfies these conditions is the harmonic sequence 
a ( = 1 / 1 for t >1; a Q e (0,1). Kohonen has shown that (under some assumptions) steepest descent 

optimization of the average expected error function (9) is possible, and leads to update rule (7). The 
update scheme at (7) has the simple geometric interpretation shown in Figure 3. 



Figure 3. Updating the winning LVQ Prototype. 

The winning prototype at iteration t. Vj t _j. is simply rotated towards the current data point by 
moving along the vector (x^- Vj t _ ^ which connects it to x^. The amount of shift depends on the 
value of a "learning rate" parameter 0 ^, which varies from 0 to 1. As seen in Figure 3, there is no 
update if 0 ^= 0 , and when 0 ^= 1 , t becomes x^ (Vj t is just a convex combination of x^ and Vj t .j). 

This process continues until termination via LVQ3.f. when the terminal prototypes yield a "best" 
hard c-partition of X via (8). 

Comments on LVQ : (1) Kohonen in [21) mentions that LVQ converges to a unique limit if and only 
if conditions (10) are satisfied. However, nothing was said about what sort or type of points the 
final weight vectors produced by LVQ are. Since LVQ does not model a well defined property of 

clusters (in fact, LVQ does not maintain a partition of the data at all), the fact that (V f ) — 1 - = ^ — > V 

does not insure that the limit vector V is a good set of prototypes in the sense of representation of 
clusters or clustering tendencies. (2) The termination strategy at LVQ3.e is based on small 
successive changes in the cluster centers. This method of algorithmic control offers the best set of 
centroids for compact representation (quantization) of the data in each cluster. However, LVQ 
seldom terminates in less than, say, 20,000 iterates unless o^-»0 : this forces it to stop because 

successive iterates are necessarily close. (3) LVQ often runs to its iterate limit, and sometimes 
passes the optimal (clustering) solution in terms of minimal apparent label error rate. This is 
called the "over-training" phenomenon in the neural network literature. 

Huntsberger and Ajjimarangsee [19] combined the 1-NP rule at (4) with Self-Organizing Feature 
Maps (SQFMs) to develop clustering algorithms. Algorithm 1 in [19) is the SOFM algorithm with 
an additional layer of neurons that does not participate in weight updating. After the self- 
organizing network terminates, the additional layer, for each input, finds the weight vector 
(prototype) closest to it and assigns the input data point to that class. A second algorithm in their 
paper used the necessary conditions for FCM to assign a membership value in [0,1) to each data 
point for each of the c classes. Specifically, Huntsberger and Ajjimarangsee suggested 
fuzzification of SOFM by replacing the learning rates (a^ usually found in rules such as (7) with 
fuzzy membership values (u^ computed with the FCM formula [17): 
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where = |x fe - v ( J . Numerical results reported in Huntsberger and Ajjimarangsee suggest 


that In many cases their algorithms and standard LVQ produce very similar answers. Their 
scheme was a partial integration of LVQ with FCM that showed some interesting results. However, 
it fell short of realizing a model for fuzzy LVQ clustering; and no properties regarding terminal 
points or convergence were established. Moreover, since the objective of LVQ is to find cluster 


centers (prototypes) In 5R P , the need for and use of the topological ordering idea of (Images of) the 
weight vectors in display space is not well justified. Consequently, the approach taken in (19) 
seems to mix two objectives, feature mapping and clustering, and the overall methodology is 
difficult to interpret In either sense. 


Integration of FCM with LVQ can be more fully realized by defining the learning rate for Kohonen 
updating as : 


a 


ik.t 



where (12a) 


m t =m Q + tKm^ - m Q ) / T| = m Q + tAm ; 


m^.,m 0 2 i; t=l,2,...T. 


(12b) 


m t replaces the (fixed) parameter m in (11). This results in three families of Fuzzy LVQ or FLVQ 
algorithms, the cases arising by different treatments of parameter m t> In particular, for 

t e {1.2 T}, we have three cases depending on choice of the initial (m 0 ) and final (m^ ) values of 

m: 


1. 

m Q > =* jmj i m^ 

: Descending FLVQ 

(13a) 

2. 

jm ( j T m 

: Ascending FLVQ 

(13b) 

3. 

m o = m j =* m ( a m 0 s m 

: FLVQ s FCM 

(13c) 


Cases 1 and 3 are discussed at length in (20). Equation (13c) asserts that when m Q = , FLVQ 

reverts to FCM; this results from defining the learning rates via (12a), and using them in the 
update rule for the prototypes shown in FLVQ3.b below. We provide a formal description of FLVQ : 

Fuzzy LVQ (FLVQ) [20] 


FLVQl. Given unlabeled data set X = {z^, Xg x^. Fix c, T, || l A and e > 0. 

FLVQ2. Initialize vq = ( Vj q v c cP 6 ^ cp ■ Choose m Q , m j. >1. 

FLVQ3. Fort = 1,2 T. 

a. Compute all (cn) learning rates (a^ t ) with (12). 

b. Update all (c) weight vectors (vj t ) with 


\t + V, (I i - 'V. 1 '' 


c. Compute Ej = flv, - v,_,| - ||» u - 

d. If E^ < e stop; Else 

Next t. 


Observe that FLVQ is not a direct fuzzy generalization of LVQ because it does not revert to LVQ in 
case all of the u^ t ’s are either 0 or 1 (the crisp case). Instead, If mQ = m^- = 1 , FCM reverts to HCM, 

and the HCM prototype update formula, which is driven by finding unique winners, as in LVQ, Is a 
different formula than (7). Nonetheless, FLVQ is perhaps the closest possible link between LVQ and 
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c-Means type algorithms. For fixed c, |Vj t } and m^, the learning rates t = (u^ t ) 
satisfy the following : 


t at (12a) 


m 


iki 


ik.t 



(14) 


where k is a positive constant. Apparently the contribution of x k to the next update of the node 
weights is inversely proportional to their distances from it. The “winner" is the v t t _ ^ closest to 
x k , and it will be moved further along the line connecting v A t _ j to x k than any of the other weight 
vectors. Since = 1 => £ < 1 . this amounts to distributing partial updates across all c nodes 

for each x^eX. This is in sharp contrast to LVQ, where only the winner is updated for each data 
point. 



Figure 4. Updating Feature Space Prototypes in FLVQ Clustering Nets. 

Figure 4 illustrates the update geometry of FLVQ; note that every node is (potentially) updated at 
every iteration, and the sum of the learning rates is always less than or equal to 1, an added 
constraint on the overall movement of the c prototypes at each t. In descending FLVQ (13a). for 
large values of (near m^), all c nodes are updated with lower individual learning rates, and as 

nif , more and more of the update is given to the “winner" node. In other words, the lateral 
distribution of learning rates is a function of t, which in the descending case “sharpens" at the 
winner node (for each x k ) as m t -» 

Comments on FLVQ : (1) In contradistinction to Huntsberger and Ay imarangsee's approach, there 
is no need to choose an update neighborhood . Neighborhood control is automatic, and depends 
entirely on the relative geometry of the data and their prototypes. (2) Reduction of the learning 

coefficient with distance (either topological or in 91 p ) from the winner node is not required. 
Instead, reduction is done automatically and adaptively by the learning rules. (3)The greater the 
mismatch to the winner ( i.e.. the higher the quantization error), the smaller the impact to the 
weight vectors associated with other nodes (recall (14)). (4)The learning process attempts to 
minimize a well-defined objective function (stepwise). This procedure depends on generation of a 
fuzzy c-partition of the data, so it is an iterative clustering model - indeed, stepwise, it is exactly 
fuzzy c-means (20). (5) Our termination strategy is based on small successive changes in the cluster 
centers. This method of algorithmic control offers the best set of centroids for compact 
representation (quantization) of the data in each cluster. 
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3. experimental results 

segmentation depends lSgety on the chS e of Ln technl ^e as a tool for image 
application of FLVQ to segmentation of l££? ,5 " 8 fi Ul , feature ve _ ctors - w e first discuss the 
digital intensity image, every pixel is usually renre* J^rf 1 ^ 68 *- a ? d then to MR lma g e s. For a 
statistics like the mean Standard ! y a ? ature vector drived from pixel 

simple feature Si" i lustrate FLV 9 “4 

from ^hftap'lSt 6 Comer'S wiX^F^feLmSf^^^V syst * matlc ™er ? sTar^ 
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Table 1. Protocols for the Computational Experiments 



Fig. 5(a) An intensity image 


Fig. 5(b) Segmentation result using lxl 


Figure 5(a) depicts the intensity image of a house Figures s (hi m 

results produced by window of sizes ixi J f 5 es 5 (b) \ (c) f nd (d > represent segmentation 
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equivalent to histogram thresholding. Comparison of Figure 5(c) with 5(b) reveals that the roof 
and the walls of the house are better segmented by the 3x3 window. On the other hand. Figure 5(d) 
contains more compact segmented regions; even the textured tree is segmented as compact 
homogeneous regions. This shows that too small a window may result in too many details, while 
too large a window may smooth out much relevant information. Probably a reasonably good 
compormise is a neighborhood of size 3x3. 




Fig. 5(c) Segmentation result using 3x3 Fig. 5(d) Segmentation result using 5x5 


If q Images are correlated in the sense that they are perfectly registered because they are taken in 
different bands, pixel vectors of size q can be erected at each spatial site by simply aggregating the 
intensity across bands. This amounts to a multichannel version of the lxl window. Magnetic 
Resonance Imagery, e.g. typically generates 3 bands, namely. T1 relaxation (spin lattice), T2 
relaxation (transverse), and p (proton density). At pixel site (i.j), MRI data can thus result in 3 
dimensional pixel vectors, say Xy = (Tly, T2y, Py). This Xy can then be used a feature vector for 

segmentation of the MR image. Figures 6(a) and 6(b) show two bands (p and T2) of one physical slice 
of an human head. Fig. 6(c) depicts the segmentation obtained using FLVQ with the parameters 
shown in the last row of Table 1. It is well-known that comparison of image segmentation 
algorithms is not an easy task 18). However, one of the most important criteria for performance 
evaluation is whether the algorithm can outline the desired or important components in the 
image. For instance, in Fig. 6(c), our segmentation delineates the white and gray matter tissue 
regions quite well. 



Fig. 6(a) p MR data 


Fig. 6(b) T2 MR data Fig. 6(c) FLVQ Segmentation 
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4. CONCLUDING REMARKS 


In this paper a family of Fuzzy generalization of LVQ (FLVQ) algorithms based on the integration 

of Fuzzy c-Means and Kohonen clustering networks have been used for image segmentation. FLVQ 

is non-sequential, unsupervised, and uses fuzzy membership values from FCM as learning rates. 

This yields automatic control of both the learning rate distribution and update neighborhood. 

Light intensity and MR images have been segmented using various feature extraction strategies; 

our results seem encouraging, but much remains to be done. 
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